Skip to content

Refactor BitwiseXOR operator using ScalarUDF framework#20086

Draft
Acfboy wants to merge 1 commit intoapache:mainfrom
Acfboy:rewrite-xor-to-function
Draft

Refactor BitwiseXOR operator using ScalarUDF framework#20086
Acfboy wants to merge 1 commit intoapache:mainfrom
Acfboy:rewrite-xor-to-function

Conversation

@Acfboy
Copy link

@Acfboy Acfboy commented Jan 31, 2026

Which issue does this PR close?

Rationale for this change

This PR is a pilot implementation of #20018. By refactoring operator into the ScalarUDF framework, we aim to:

  1. Move operator type handling from the physical execution phase to the logical plan phase, consistent with scalar functions.
  2. Simplify the operator handling logic which is currently scattered across the codebase.

What changes are included in this PR?

Refactor BitwiseXOR operator handling using ScalarUDF framework

Are these changes tested?

Yes.

  • All existing integration tests pass.
  • Move some existing binary operator tests to the new UDF-based implementation.

Are there any user-facing changes?

No.

@github-actions github-actions bot added sql SQL Planner physical-expr Changes to the physical-expr crates functions Changes to functions implementation labels Jan 31, 2026
@Acfboy
Copy link
Author

Acfboy commented Jan 31, 2026

Hi @2010YOUY01, I've submitted the pilot pr for bitwise_xor as we discussed. Looking forward to your feedback!

The implementation logic for bitwise_or and bitwise_and is very similar to bitwise_xor. If this approach is acceptable, I plan to refactor those operators using macros in a follow-up step.

@Jefffrey
Copy link
Contributor

At what phase of planning do we do this replacement? I have a concern that we have optimizer rules based around binary exprs that are BitwiseXor ops that might be made invalid if we substitute it with a scalar UDF 🤔

@Acfboy
Copy link
Author

Acfboy commented Jan 31, 2026

Hi @Jefffrey, thanks for the feedback!

My apologies, I didn't consider this thoroughly enough. I realize now that making this change directly (during the initial conversion from statement to LogicalPlan) would break the optimizer. For instance, I haven't even migrated the logic from simplify_expr to the UDF's simplify method yet.

I will convert this pr to a draft for now and carefully study the impact of this change on existing optimization rules.

@Acfboy Acfboy marked this pull request as draft January 31, 2026 03:28
@2010YOUY01
Copy link
Contributor

2010YOUY01 commented Feb 6, 2026

Thank you for this POC; it helps clarify the issue.

I see two benefits to unifying operators and UDFs:

  1. Optimizer rules would no longer need to differentiate between expressions and UDFs, which could simplify many optimizer implementations.
  2. A unified Operator/UDF API: as described in Unifying Operator Handling with the Scalar Function Framework #20018, we already have good signature checking and user-friendly error message utilities for UDFs, but not for operators.

This PR’s approach is to replace operator inside Expr with UDFs during the SQL -> Logical Plan stage. Conceptually, I think this is the ideal way operator expr should have been implemented from the start. However, doing this now would require migrating all related optimizer implementations for that operator. Based on some code searching, I believe this is effectively infeasible: changes that involve deep optimizer refactoring are particularly hard to review, and at the moment there likely isn’t enough review capacity for such a large and complex change.

As a middle ground, I have another idea (this gives up 1 for the reason above, but still tries to achieve 2): operator structs that implement PhysicalExpr could be thin wrappers around ScalarUDF, for example:

#[derive(Debug, Clone, Eq)]
pub struct BinaryExpr {
    left: Arc<dyn PhysicalExpr>,
    op: Operator,
    right: Arc<dyn PhysicalExpr>,
    /// Specifies whether an error is returned on overflow or not
    fail_on_overflow: bool,
    inner: Arc<dyn ScalarUDF>,
}

Here, inner would handle the heavy lifting such as signature checking and execution, while the outer BinaryExpr would mainly delegate to inner to implement PhysicalExpr.

This way, operators and UDFs could share a consistent interface and reuse utilities like signature checking and user-friendly error messages. I think this approach is feasible, though I may be underestimating the difficulty since I haven’t worked extensively in these modules; I believe @Jefffrey has a deeper understanding here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation physical-expr Changes to the physical-expr crates sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants